1,203 research outputs found
Knowledge-infused and Consistent Complex Event Processing over Real-time and Persistent Streams
Emerging applications in Internet of Things (IoT) and Cyber-Physical Systems
(CPS) present novel challenges to Big Data platforms for performing online
analytics. Ubiquitous sensors from IoT deployments are able to generate data
streams at high velocity, that include information from a variety of domains,
and accumulate to large volumes on disk. Complex Event Processing (CEP) is
recognized as an important real-time computing paradigm for analyzing
continuous data streams. However, existing work on CEP is largely limited to
relational query processing, exposing two distinctive gaps for query
specification and execution: (1) infusing the relational query model with
higher level knowledge semantics, and (2) seamless query evaluation across
temporal spaces that span past, present and future events. These allow
accessible analytics over data streams having properties from different
disciplines, and help span the velocity (real-time) and volume (persistent)
dimensions. In this article, we introduce a Knowledge-infused CEP (X-CEP)
framework that provides domain-aware knowledge query constructs along with
temporal operators that allow end-to-end queries to span across real-time and
persistent streams. We translate this query model to efficient query execution
over online and offline data streams, proposing several optimizations to
mitigate the overheads introduced by evaluating semantic predicates and in
accessing high-volume historic data streams. The proposed X-CEP query model and
execution approaches are implemented in our prototype semantic CEP engine,
SCEPter. We validate our query model using domain-aware CEP queries from a
real-world Smart Power Grid application, and experimentally analyze the
benefits of our optimizations for executing these queries, using event streams
from a campus-microgrid IoT deployment.Comment: 34 pages, 16 figures, accepted in Future Generation Computer Systems,
October 27, 201
Holistic Measures for Evaluating Prediction Models in Smart Grids
The performance of prediction models is often based on "abstract metrics"
that estimate the model's ability to limit residual errors between the observed
and predicted values. However, meaningful evaluation and selection of
prediction models for end-user domains requires holistic and
application-sensitive performance measures. Inspired by energy consumption
prediction models used in the emerging "big data" domain of Smart Power Grids,
we propose a suite of performance measures to rationally compare models along
the dimensions of scale independence, reliability, volatility and cost. We
include both application independent and dependent measures, the latter
parameterized to allow customization by domain experts to fit their scenario.
While our measures are generalizable to other domains, we offer an empirical
analysis using real energy use data for three Smart Grid applications:
planning, customer education and demand response, which are relevant for energy
sustainability. Our results underscore the value of the proposed measures to
offer a deeper insight into models' behavior and their impact on real
applications, which benefit both data mining researchers and practitioners.Comment: 14 Pages, 8 figures, Accepted and to appear in IEEE Transactions on
Knowledge and Data Engineering, 2014. Authors' final version. Copyright
transferred to IEE
ME-ViT: A Single-Load Memory-Efficient FPGA Accelerator for Vision Transformers
Vision Transformers (ViTs) have emerged as a state-of-the-art solution for
object classification tasks. However, their computational demands and high
parameter count make them unsuitable for real-time inference, prompting the
need for efficient hardware implementations. Existing hardware accelerators for
ViTs suffer from frequent off-chip memory access, restricting the achievable
throughput by memory bandwidth. In devices with a high compute-to-communication
ratio (e.g., edge FPGAs with limited bandwidth), off-chip memory access imposes
a severe bottleneck on overall throughput. This work proposes ME-ViT, a novel
\underline{M}emory \underline{E}fficient FPGA accelerator for \underline{ViT}
inference that minimizes memory traffic. We propose a \textit{single-load
policy} in designing ME-ViT: model parameters are only loaded once,
intermediate results are stored on-chip, and all operations are implemented in
a single processing element. To achieve this goal, we design a memory-efficient
processing element (ME-PE), which processes multiple key operations of ViT
inference on the same architecture through the reuse of \textit{multi-purpose
buffers}. We also integrate the Softmax and LayerNorm functions into the ME-PE,
minimizing stalls between matrix multiplications. We evaluate ME-ViT on
systolic array sizes of 32 and 16, achieving up to a 9.22 and
17.89 overall improvement in memory bandwidth, and a 2.16
improvement in throughput per DSP for both designs over state-of-the-art ViT
accelerators on FPGA. ME-ViT achieves a power efficiency improvement of up to
4.00 (1.03) over a GPU (FPGA) baseline. ME-ViT enables up to 5
ME-PE instantiations on a Xilinx Alveo U200, achieving a 5.10
improvement in throughput over the state-of-the art FPGA baseline, and a
5.85 (1.51) improvement in power efficiency over the GPU (FPGA)
baseline
Dynasor: A Dynamic Memory Layout for Accelerating Sparse MTTKRP for Tensor Decomposition on Multi-core CPU
Sparse Matricized Tensor Times Khatri-Rao Product (spMTTKRP) is the most
time-consuming compute kernel in sparse tensor decomposition. In this paper, we
introduce a novel algorithm to minimize the execution time of spMTTKRP across
all modes of an input tensor on multi-core CPU platform. The proposed algorithm
leverages the FLYCOO tensor format to exploit data locality in external memory
accesses. It effectively utilizes computational resources by enabling lock-free
concurrent processing of independent partitions of the input tensor. The
proposed partitioning ensures load balancing among CPU threads. Our dynamic
tensor remapping technique leads to reduced communication overhead along all
the modes. On widely used real-world tensors, our work achieves 2.12x - 9.01x
speedup in total execution time across all modes compared with the
state-of-the-art CPU implementations
GoFFish: A Sub-Graph Centric Framework for Large-Scale Graph Analytics
Large scale graph processing is a major research area for Big Data
exploration. Vertex centric programming models like Pregel are gaining traction
due to their simple abstraction that allows for scalable execution on
distributed systems naturally. However, there are limitations to this approach
which cause vertex centric algorithms to under-perform due to poor compute to
communication overhead ratio and slow convergence of iterative superstep. In
this paper we introduce GoFFish a scalable sub-graph centric framework
co-designed with a distributed persistent graph storage for large scale graph
analytics on commodity clusters. We introduce a sub-graph centric programming
abstraction that combines the scalability of a vertex centric approach with the
flexibility of shared memory sub-graph computation. We map Connected
Components, SSSP and PageRank algorithms to this model to illustrate its
flexibility. Further, we empirically analyze GoFFish using several real world
graphs and demonstrate its significant performance improvement, orders of
magnitude in some cases, compared to Apache Giraph, the leading open source
vertex centric implementation.Comment: Under review by a conference, 201
- …